On Proxy Variables and Categorical Data Fusion
نویسنده
چکیده
The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.
منابع مشابه
On the uncertainty and techniques of categorical data fusion
Statistical matching (or data fusion) has long been used to merge separate data files in order to generate a joint fusion data set. Since the target joint data are not observable, it is recognized that, in addition to sampling variations that exist in the separate data files, there is an identification uncertainty associated with the assumptions that underpin the fusion procedure. In this paper...
متن کاملمدل رگرسیون لجستیک چند حالته با مقادیر گم شده و کاربرد آن در بررسی بیماری گواتر
In large–scale sampling opeartions (e.g. nation-wide health surveys) we always face the problem of non-response item(s) and/or non-response unit(s). In fitting a model to the data we have two groups of variables, namely dependent and independent variables. Non-response may occur for any of these groups of variables. In this paper we assume Y as a categorical dependent variable with three levels...
متن کاملTown trip forecasting based on data mining techniques
In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests f...
متن کاملFractured Reservoirs History Matching based on Proxy Model and Intelligent Optimization Algorithms
In this paper, a new robust approach based on Least Square Support Vector Machine (LSSVM) as a proxy model is used for an automatic fractured reservoir history matching. The proxy model is made to model the history match objective function (mismatch values) based on the history data of the field. This model is then used to minimize the objective function through Particle Swarm Optimization (...
متن کاملPresenting a structural model to explain academic Burnout of medical sciences students based on thought action fusion, emotion control and imposter syndrome
Psychological variables in university environments which are diverse in terms of individual and personality differences increase student adaptability and affect their academic performance. The purpose of this study was to determine the relationship between the thought action fusion and emotional control with the symptoms of academic burnout in students through the mediation role of imposter syn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015